Latent Space Exploration with StyleGAN2
A conceptual tutorial of what the latent space is!
- Introduction & Disclaimers
- Experiment Layout
- 0 Set-Up (Run only once!)
- 1. A Quick Review of GANs
- 2. Generate Images of People who don't Exist
- 3. Interpolation of Latent Codes
- 4. Facial Image Alignment using Landmark Detection
- 5. Projecting our own Input Images into the Latent Space
- 6. Latent Directions/Controls to modify our projected images
- 7. Bonus: Interactive Widget-App!
Introduction & Disclaimers
Welcome!
This notebook is an introduction to the concept of latent space, using a recent (and amazing) generative network: StyleGAN2
Here are some great blog posts I found useful when learning about the latent space + StyleGAN2
- Latent Space Understanding: Ekin Tiu - Understanding Latent Space in Machine Learning
-
A technical overview of the stylegan2 architecture: Connor Shorten - StyleGAN2
-
Overview of GANs + what's changed up to StyleGAN2: akira - From GAN basic to StyleGAN2
In this notebook, we will be experimenting with the following:
-
Set-Up (Run only once!)
- This section simply pulls the small repo containing necessary files needed to run things!
-
A Quick Review of GANs.
- A quick refresh of:
z(latent vectors),generators, anddiscriminators. - GANs vs VAEs
- A quick refresh of:
-
Generate Images of People who don't Exist
- Example: Use the official StyleGAN2 repo to create
generatoroutputs. - Example: View the latent codes of these generated outputs.
- Example: Use the official StyleGAN2 repo to create
-
Interpolation of Latent Codes.
- How?
- Example: Use the previous
generatoroutputs' latent codes to "morph" images of people together.
-
Facial Image Alignment using Landmark Detection.
- Why?
- Example: Aligning (normalizing) our own input images for projection.
-
Projecting our own Input Images into the Latent Space.
- Why, and how?
- Example: Learning the latent codes of our new aligned input images.
- Example: Interpolation of projected latent codes. (Similar to Section 2, but with our images!)
-
Latent Directions/Controls to modify our projected images.
- What, and How?
- Example: Using pre-computed latent directions to alter facial features of our own images.
-
Bonus: Interactive Widget-App!
- Play with latent controls yourself using this little jupyter app I built using ipywidgets.
Clone Repo and extract contents
!git clone https://github.com/AmarSaini/Epoching_StyleGan2_Setup.git
import shutil
from pathlib import Path
repo_root = Path('Epoching_StyleGan2_Setup/')
# Pull contents out of the repo, into our current directory.
for content in repo_root.iterdir():
shutil.move(str(content), '.')
shutil.rmtree(repo_root)
Pip install needed packages
# These Python packages are the only thing missing from gradient paperspace's TensorFlow 1.14 container!
!pip install requests
!pip install Pillow
!pip install tqdm
!pip install dlib
I'm going to try to keep this section short, and just go over the needed information to understand the rest of this post:
GANs (Generative Adversarial Networks) consist of two models:
- The
Generator: A model that converts a latent code into some kind of output (an image of a person, in our case). - The
Discriminator: A model that determines whether some input (an image of a person), is real or fake.- Real: An image from the original dataset.
- Fake: An image from the
Generator.
The input to a Generator is a latent code, a vector of numbers if you will. (Such as a vector of 512 numbers).
- During training, the latent code is randomly sampled (i.e. a random vector of 512 numbers).
- When this latent code is randomly sampled, we can call it a latent random variable, as shown in the figure below.
- This magical latent code holds information that will allow the
Generatorto create a specific output. - If you can find a latent code for a particular input, you can represent it with smaller amounts of data! (Such as representing a picture of someone with only a latent vector of 512 numbers, as opposed to the original image size)
Generator actually never sees the input images, hence we don’t have a way to automatically convert images into it’s corresponding latent code! Teaser: That’s what projection is for, section 4 :)

#collapse-hide
import sys
sys.path.append('stylegan2/')
from stylegan2 import pretrained_networks
from stylegan2 import dnnlib
from stylegan2.dnnlib import tflib
from pathlib import Path
from PIL import Image
import pickle
import numpy as np
import ipywidgets as widgets
from tqdm import tqdm
model_path = 'gdrive:networks/stylegan2-ffhq-config-f.pkl'
fps = 20
results_size = 400
#collapse-hide
# Code to load the StyleGAN2 Model
def load_model():
_G, _D, Gs = pretrained_networks.load_networks(model_path)
noise_vars = [var for name, var in Gs.components.synthesis.vars.items() if name.startswith('noise')]
Gs_kwargs = dnnlib.EasyDict()
Gs_kwargs.output_transform = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
Gs_kwargs.randomize_noise = False
return Gs, noise_vars, Gs_kwargs
# Generate images given a random seed (Integer)
def generate_image_random(rand_seed):
rnd = np.random.RandomState(rand_seed)
z = rnd.randn(1, *Gs.input_shape[1:])
tflib.set_vars({var: rnd.randn(*var.shape.as_list()) for var in noise_vars})
images = Gs.run(z, None, **Gs_kwargs)
return images, z
# Generate images given a latent code ( vector of size [1, 512] )
def generate_image_from_z(z):
images = Gs.run(z, None, **Gs_kwargs)
return images
Lets go ahead and start generating some outputs!
#collapse-show
# Loading the StyleGAN2 Model!
Gs, noise_vars, Gs_kwargs = load_model()
#collapse-show
# Ask the generator to make an output, given a random seed number: 42
images, latent_code1 = generate_image_random(42)
image1 = Image.fromarray(images[0]).resize((results_size, results_size))
latent_code1.shape
(1, 512). This means that the numbers inside latent_code1 can be used to create the image below!
image1
Let's make another image!
#collapse-show
# Ask the generator to make an output, given a random seed number: 1234
images, latent_code2 = generate_image_random(1234)
image2 = Image.fromarray(images[0]).resize((results_size, results_size))
latent_code2.shape
latent_code1[0][:5], latent_code2[0][:5]
latent_code2 is also (1, 512). However, the two codes are not the same! This is seen in the first five values in the previous cell :). Below is the corresponding image for generating output with latent_code2
image2
So what's the big deal? We have two codes to make two people that don't even exist right? Well, the cool thing about the latent space is that you can "traverse" through it!
Since the latent space is a compressed representation of some data, things are are similar in appearance should be "close" to each other in the latent space.
If the latent space is well developed, we can actually transition between points in this space and create intermediate outputs of the endpoints!
In other words... we can morph two people together! See gif below for an example!

#collapse-show
def linear_interpolate(code1, code2, alpha):
return code1 * alpha + code2 * (1 - alpha)
Now let's do this on our examples we just generated! :D.
Let's interpolate halfway between latent_code1, latent_code2
interpolated_latent_code = linear_interpolate(latent_code1, latent_code2, 0.5)
interpolated_latent_code.shape
latent_code1 and 50% of latent_code2, (alpha=0.5), and summed them together! Below is the resulting image.
images = generate_image_from_z(interpolated_latent_code)
Image.fromarray(images[0]).resize((results_size, results_size))
Let's also make a cool interpolation animation; It'll help with visualizing the effect of interpolating from alpha=0 to alpha=1
#collapse-show
output_gifs_path = Path('output_gifs')
# Make Output Gifs folder if it doesn't exist.
if not output_gifs_path.exists():
output_gifs_path.mkdir()
#collapse-hide
def get_concat_h(im1, im2):
dst = Image.new('RGB', (im1.width + im2.width, im1.height))
dst.paste(im1, (0, 0))
dst.paste(im2, (im1.width, 0))
return dst
def make_latent_interp_animation(code1, code2, img1, img2, num_interps):
step_size = 1.0/num_interps
all_imgs = []
amounts = np.arange(0, 1, step_size)
for alpha in tqdm(amounts):
interpolated_latent_code = linear_interpolate(code1, code2, alpha)
images = generate_image_from_z(interpolated_latent_code)
interp_latent_image = Image.fromarray(images[0]).resize((400, 400))
frame = get_concat_h(img1, interp_latent_image)
frame = get_concat_h(frame, img2)
all_imgs.append(frame)
save_name = output_gifs_path/'latent_space_traversal.gif'
all_imgs[0].save(save_name, save_all=True, append_images=all_imgs[1:], duration=1000/fps, loop=0)
make_latent_interp_animation(latent_code1, latent_code2, image1, image2, num_interps=200)
output_gifs/latent_space_traversal.gif :)

latent_code1 to latent_code2 by slowly changing alpha from 0 to 1. (increasing alpha by 1/200 per iteration, until it reaches 1.0)
Ok so this is all fun and stuff right? How could we play around with our own images, instead of random people that don't exist?
Well, we first have to project our own images into this latent space.
To align (normalize) our images for StyleGAN2, we need to use a landmark detection model. This will automatically find the facial keypoints of interest, and crop/rotate accordingly.
Below is an example!
imgs/ folder, and delete the example images, Jeremy_Howard.jpg and Obama.jpg. Then upload 2 of your own!
#collapse-hide
# One-Time Download of Facial Landmark Detection Model Weights
if Path('shape_predictor_68_face_landmarks.dat') not in list(Path('.').iterdir()):
!wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
!bzip2 -dv shape_predictor_68_face_landmarks.dat.bz2
#collapse-show
orig_img_path = Path('imgs')
aligned_imgs_path = Path('aligned_imgs')
# Make Aligned Images folder if it doesn't exist.
if not aligned_imgs_path.exists():
aligned_imgs_path.mkdir()
orig_img_path, aligned_imgs_path
#collapse-show
from align_face import align_face
# Align all of our images using a landmark detection model!
all_imgs = list(orig_img_path.iterdir())
for img in all_imgs:
align_face(str(img)).save(aligned_imgs_path/('aligned_'+img.name))
Let's load the original + aligned images into Jupyter!
#collapse-show
aligned_img_set = list(aligned_imgs_path.iterdir())
aligned_img_set.sort()
aligned_img_set = [Image.open(x) for x in aligned_img_set]
orig_img_set = list(orig_img_path.iterdir())
orig_img_set.sort()
orig_img_set = [Image.open(x) for x in orig_img_set]
get_concat_h(orig_img_set[0], aligned_img_set[0])
get_concat_h(orig_img_set[1], aligned_img_set[1])
You can either manually restart your kernel, or run the below cell:
#collapse-show
# Automatically restart the kernel by running this cell
import IPython
IPython.Application.instance().kernel.do_shutdown(True)
#collapse-hide
import sys
sys.path.append('stylegan2/')
from stylegan2 import pretrained_networks
from stylegan2 import dnnlib
from stylegan2.dnnlib import tflib
from pathlib import Path
from PIL import Image
import pickle
import numpy as np
import ipywidgets as widgets
from tqdm import tqdm
model_path = 'gdrive:networks/stylegan2-ffhq-config-f.pkl'
fps = 20
results_size = 400
-W ignore option when executing python on the command line.
# collapse-hide
!python -W ignore stylegan2/dataset_tool.py create_from_images datasets_stylegan2/custom_imgs aligned_imgs/
- Projecting an image into the latent space basically means: Let's figure out what latent code (512 numbers) will cause the generator to make an output that looks like our image
- The question is, how do we figure out the latent code? With VAEs (variational autoencoder), we just throw our image through the encoder and we get our latent code just like that!
- With GANs, we don't necessarily have a direct way to extract latent codes from an input image, but we can optimize for it.
- In a nut shell, here's how we can optimize for a latent code for given input images:
For as many iterations as you'd like, do:
- Ask the generator to generate some output from a starting latent vector.
- Take the generator's output image, and take your target image, put them both through a VGG16 model (image feature extractor).
- Take the generator's output image features from the VGG16.
- Take the target image features from the VGG16.
- Compute the loss on the difference of features!
- Backprop
# collapse-hide
tot_aligned_imgs = 2
# collapse-hide
!python -W ignore stylegan2/epoching_custom_run_projector.py project-real-images --network=$model_path \
--dataset=custom_imgs --data-dir=datasets_stylegan2 --num-images=$tot_aligned_imgs --num-snapshots 500
#collapse-hide
def get_concat_h(im1, im2):
dst = Image.new('RGB', (im1.width + im2.width, im1.height))
dst.paste(im1, (0, 0))
dst.paste(im2, (im1.width, 0))
return dst
def make_project_progress_gifs():
all_result_folders = list(Path('results/').iterdir())
all_result_folders.sort()
last_result_folder = all_result_folders[-1]
for img_num in range(tot_aligned_imgs):
all_step_pngs = [x for x in last_result_folder.iterdir() if x.name.endswith('png') and 'image{0:04d}'.format(img_num) in x.name]
all_step_pngs.sort()
target_image = Image.open(all_step_pngs[-1]).resize((results_size, results_size))
all_concat_imgs = []
for step_img_path in all_step_pngs[:-1]:
step_img = Image.open(step_img_path).resize((results_size, results_size))
all_concat_imgs.append(get_concat_h(target_image, step_img))
all_concat_imgs[0].save('output_gifs/image{0:04d}_project_progress.gif'.format(img_num), save_all=True, append_images=all_concat_imgs[1:], duration=1000/fps, loop=0)
make_project_progress_gifs()
output_gifs/image####_project_progress.gif :)


Let's look at the optimized latent codes we have acquired through this projection process!
#collapse-hide
def get_final_latents():
all_results = list(Path('results/').iterdir())
all_results.sort()
last_result = all_results[-1]
latent_files = [x for x in last_result.iterdir() if 'final_latent_code' in x.name]
latent_files.sort()
all_final_latents = []
for file in latent_files:
with open(file, mode='rb') as latent_pickle:
all_final_latents.append(pickle.load(latent_pickle))
return all_final_latents
latent_codes = get_final_latents()
len(latent_codes), latent_codes[0].shape, latent_codes[1].shape
(1, 18, 512), instead of the (1, 512) shape we saw earlier on the generated (fake) examples. This is due to one of the architecture designs of StyleGAN2, it actually re-iterates the base latent vector at different levels in the generator to allow for small deviations in the latent code to support variance in style. During training, just one static latent vector of the shape (1, 512) is used. For a more detailed explanation, check out the recommended technical StyleGAN2 overview blog posts mentioned in the introduction! :)
#collapse-hide
def load_model():
_G, _D, Gs = pretrained_networks.load_networks(model_path)
noise_vars = [var for name, var in Gs.components.synthesis.vars.items() if name.startswith('noise')]
Gs_kwargs = dnnlib.EasyDict()
Gs_kwargs.output_transform = dict(func=tflib.convert_images_to_uint8, nchw_to_nhwc=True)
Gs_kwargs.randomize_noise = False
return Gs, noise_vars, Gs_kwargs
def generate_image_from_projected_latents(latent_vector):
images = Gs.components.synthesis.run(latent_vector, **Gs_kwargs)
return images
#collapse-show
# Loading the StyleGAN2 Model!
Gs, noise_vars, Gs_kwargs = load_model()
Let's check out what our images look like from our latent codes!
#collapse-hide
output_gifs_path = Path('output_gifs/')
aligned_imgs_path = Path('aligned_imgs/')
aligned_img_set = list(aligned_imgs_path.iterdir())
aligned_img_set.sort()
aligned_img_set = [Image.open(x) for x in aligned_img_set]
#collapse-show
# Ask the generator to make an output, given a latent code we found from the projection process.
images = generate_image_from_projected_latents(latent_codes[0])
recreated_img1 = Image.fromarray(images[0]).resize((results_size, results_size))
orig_img1 = aligned_img_set[1].resize((results_size, results_size))
get_concat_h(orig_img1, recreated_img1)
#collapse-show
# Ask the generator to make an output, given a latent code we found from the projection process.
images = generate_image_from_projected_latents(latent_codes[1])
recreated_img2 = Image.fromarray(images[0]).resize((results_size, results_size))
orig_img2 = aligned_img_set[0].resize((results_size, results_size))
get_concat_h(orig_img2, recreated_img2)
Now let's re-run the interpolation animation we did back in section 2, but this time with our own projected latent codes!
#collapse-hide
def linear_interpolate(code1, code2, alpha):
return code1 * alpha + code2 * (1 - alpha)
def make_latent_interp_animation_real_faces(code1, code2, img1, img2, num_interps):
step_size = 1.0/num_interps
all_imgs = []
amounts = np.arange(0, 1, step_size)
for alpha in tqdm(amounts):
interpolated_latent_code = linear_interpolate(code1, code2, alpha)
images = generate_image_from_projected_latents(interpolated_latent_code)
interp_latent_image = Image.fromarray(images[0]).resize((400, 400))
frame = get_concat_h(img2, interp_latent_image)
frame = get_concat_h(frame, img1)
all_imgs.append(frame)
save_name = output_gifs_path/'projected_latent_space_traversal.gif'
all_imgs[0].save(save_name, save_all=True, append_images=all_imgs[1:], duration=1000/fps, loop=0)
make_latent_interp_animation_real_faces(latent_codes[0], latent_codes[1], recreated_img1, recreated_img2, num_interps=200)
output_gifs/projected_latent_space_traversal.gif :)

Time to be an astronaut and explore space! Well, the hidden (latent) kind of space. Alright... I admit, that joke was blatant. Sorry for the puns, I'm just trying to relate things together. Ok ok ok, you need some space, got it.
There are ways to learn latent directions (both supervised, and unsupervised) in the latent space to control features. People have already open-sourced some directional latent vectors for StyleGAN2 that allow us to "move" in the latent space and control a particular feature.
- Supervised Method of learning these latent directions:
"We first collect multiple samples (image + latent) from our model and manually classify the images for our target attribute (e.g. smiling VS not smiling), trying to guarantee proper class representation balance. We then train a model to classify or regress on our latents and manual labels. At this point we can use the learned functions of these support models as transition directions" - 5agado's Blog
- Unsupervised Method: Unsupervised Discovery of Interpretable Directions in the GAN Latent Space
To move in a latent direction we can do the following operation:
latent_code = latent_code + latent_direction * magnitude
-
latent_codeis our latent code, such as our recently optimized latent code! -
latent_directionis a learnt directional vector that is of shape(18, 512). This vector tells you where to move in the latent space to control a feature, but not how much to move. -
magnitudeis the amount to move in the direction oflatent_direction
This means we can create more interpolations in the latent space! Yay, more animations :). Only this time, rather than interpolating between to points, we are slowing moving in a specific latent_direction.
- Instead of mixing two latent codes together, we slowly add more magnitude to our base latent code, and observe how it changes with respect to magnitude!
#collapse-show
def get_control_latent_vectors(path):
files = [x for x in Path(path).iterdir() if str(x).endswith('.npy')]
latent_vectors = {f.name[:-4]:np.load(f) for f in files}
return latent_vectors
latent_controls = get_control_latent_vectors('stylegan2directions/')
len(latent_controls), latent_controls.keys(), latent_controls['age'].shape
#collapse-hide
def make_latent_control_animation(feature, start_amount, end_amount, step_size, person):
all_imgs = []
amounts = np.linspace(start_amount, end_amount, abs(end_amount-start_amount)/step_size)
for amount_to_move in tqdm(amounts):
modified_latent_code = np.array(latent_code_to_use)
modified_latent_code += latent_controls[feature]*amount_to_move
images = generate_image_from_projected_latents(modified_latent_code)
latent_img = Image.fromarray(images[0]).resize((results_size, results_size))
all_imgs.append(get_concat_h(image_to_use, latent_img))
save_name = output_gifs_path/'{0}_{1}.gif'.format(person, feature)
all_imgs[0].save(save_name, save_all=True, append_images=all_imgs[1:], duration=1000/fps, loop=0)
#collapse-hide
latent_code_to_use = latent_codes[1]
image_to_use = recreated_img2
make_latent_control_animation(feature='age', start_amount=-10, end_amount=10, step_size=0.1, person='jeremy')
#collapse-hide
latent_code_to_use = latent_codes[0]
image_to_use = recreated_img1
make_latent_control_animation(feature='age', start_amount=-5, end_amount=5, step_size=0.1, person='obama')
output_gifs/person_feature.gif :)





#input-hide
def apply_latent_controls(self):
image_outputs = controller.children[0]
feature_sliders = controller.children[1]
slider_hboxes = feature_sliders.children[:-2]
latent_movements = [(x.children[1].value, x.children[0].value) for x in slider_hboxes]
modified_latent_code = np.array(latent_code_to_use)
for feature, amount_to_move in latent_movements:
modified_latent_code += latent_controls[feature]*amount_to_move
images = generate_image_from_projected_latents(modified_latent_code)
latent_img = Image.fromarray(images[0]).resize((400, 400))
latent_img_output = image_outputs.children[1]
with latent_img_output:
latent_img_output.clear_output()
display(latent_img)
def reset_latent_controls(self):
image_outputs = controller.children[0]
feature_sliders = controller.children[1]
slider_hboxes = feature_sliders.children[:-2]
for x in slider_hboxes:
x.children[0].value = 0
latent_img_output = image_outputs.children[1]
with latent_img_output:
latent_img_output.clear_output()
display(image_to_use)
def create_interactive_latent_controller():
orig_img_output = widgets.Output()
with orig_img_output:
orig_img_output.clear_output()
display(image_to_use)
latent_img_output = widgets.Output()
with latent_img_output:
latent_img_output.clear_output()
display(image_to_use)
image_outputs = widgets.VBox([orig_img_output, latent_img_output])
#collapse-hide
generate_button = widgets.Button(description='Generate', layout=widgets.Layout(width='75%', height='10%'))
generate_button.on_click(apply_latent_controls)
reset_button = widgets.Button(description='Reset Latent Controls', layout=widgets.Layout(width='75%', height='10%'))
reset_button.on_click(reset_latent_controls)
feature_sliders = []
for feature in latent_controls:
label = widgets.Label(feature)
slider = widgets.FloatSlider(min=-50, max=50)
feature_sliders.append(widgets.HBox([slider, label]))
feature_sliders.append(generate_button)
feature_sliders.append(reset_button)
feature_sliders = widgets.VBox(feature_sliders)
return widgets.HBox([image_outputs, feature_sliders])
#collapse-show
latent_code_to_use = latent_codes[0]
image_to_use = recreated_img1
controller = create_interactive_latent_controller()
controller
#collapse-show
latent_code_to_use = latent_codes[1]
image_to_use = recreated_img2
controller = create_interactive_latent_controller()
controller







